Large Alphabet Coding and Prediction through Poissonization and Tilting
نویسندگان
چکیده
This paper introduces a convenient strategy for compression and prediction of sequences of independent, identically distributed random variables generated from a large alphabet of size m. In particular, the size of the sample is allowed to be variable. The employment of a Poisson model and tilting method simplifies the implementation and analysis through independence. The resulting strategy is optimal within the class of distributions satisfying a moment condition, and is close to optimal for a smaller class – the class of distributions with an analogous condition on the counts. Moreover, the method can be used to code and predict sequences in a subset with the tail counts satisfying a given condition, and it can also be applied to envelope classes.
منابع مشابه
Large Alphabet Compression and Predictive Distributions through Poissonization and Tilting
This paper introduces a convenient strategy for coding and predicting sequences of independent, identically distributed random variables generated from a large alphabet of size m. In particular, the size of the sample is allowed to be variable. The employment of a Poisson model and tilting method simplifies the implementation and analysis through independence. The resulting strategy is optimal ...
متن کاملA large-alphabet-oriented scheme for Chinese and English text compression
In this paper, a large alphabet oriented scheme is proposed for both Chinese and English text compression. Our scheme parses Chinese text with the alphabet defined by Big-5 code, and parses English text with some rules designed here. Thus, the alphabet used for English is not a word alphabet. After parsed out into tokens, zero, first, and second order Markov models are used to estimate the occu...
متن کاملA pr 2 00 5 Prediction of Large Alphabet Processes and Its Application to Adaptive Source Coding ∗
The problem of predicting a sequence x1, x2, · · · generated by a discrete source with unknown statistics is considered. Each letter xt+1 is predicted using information on the word x1x2 · · · xt only. In fact, this problem is a classical problem which has received much attention. Its history can be traced back to Laplace. We address the problem where each xi belongs to some large (or even infin...
متن کاملPrediction of Large Alphabet Processes and Its Application to Adaptive Source Coding
The problem of predicting a sequence x1, x2, · · · generated by a discrete source with unknown statistics is considered. Each letter xt+1 is predicted using the information on the word x1x2 · · · xt only. This problem is of great importance for data compression, because of its use to estimate probability distributions for PPM algorithms and other adaptive codes. On the other hand, such predicti...
متن کاملAlphabet Partitioning Techniques for Semi-Adaptive Huffman Coding of Large Alphabets Alphabet Partitioning Techniques for Semi-Adaptive Huffman Coding of Large Alphabets∗
Practical applications that employ entropy coding for large alphabets often partition the alphabet set into two or more layers and encode each symbol by using some suitable prefix coding for each layer. In this paper, we formulate the problem of finding an alphabet partitioning for the design of a two-layer semi-adaptive code as an optimization problem, and give a solution based on dynamic prog...
متن کامل